Getting Started: R and BigQuery
This slide deck was built in Quarto!
- Use keyboard arrow keys to
- advance ( → ) and
- go back ( ← )
- Type “s” to see speaker notes
- Type “?” to see other keyboard shortcuts
Progress So Far
So far, you’ve:
- Signed up for GCP
- Created a project
- Added BigQuery to that project
- Looked at public data in another project
- Used that data to create a query and a table in your project
What’s Left?
- Exporting your table for use elsewhere
- Connecting directly to BigQuery from RStudio
- (Maybe) Vertex AI Workbench
Let’s Export!
Getting Query Results to Save
Since you’ve saved the query you used to create this table, you can re-run that query.
Or, since you’ve created a table that contains all the data you want, you run a “give me everything” query:
SELECT * FROM [table_name]
Saving Data Exercise
Run a query – either your saved query or a “SELECT * FROM…” on your saved table.
Then Click the Save Results button and save the .csv to your computer.
R vs RStudio
Quick reminder: R is a language, that can be run in many settings:
- from the command line
- in an automation (like a cron job)
- from the R IDE (very bare bones)
- from the RStudio IDE
- within a notebook like a Jupyter notebook
RStudio is a fully featured IDE that runs on Linux, Windows, and Mac. It’s much more heavy weight than just the language.
Working with RStudio IDE
Many of us love RStudio. You can spin up an RStudio server in GCP… but you don’t have to, and it might mean spending money when you don’t need to.
You can use your normal RStudio (on your computer or in Posit.cloud) to work with BigQuery.
Using non-GCP RStudio
Please open whatever RStudio you typically use. https://posit.cloud is a good option!
When you get there, open a new Quarto or R Markdown document.
Exercise
Please install bigrquery as well as tidyverse, if you don’t already have that installed.
Then you’re going to start a new Quarto or R Markdown document and add a code chunk that loads these two libraries:
Authenticating
Now you need to authenticate so that RStudio can connect to BigQuery. You’ll next type:
bq_auth()
Use the same Google Identity you’re using for GCP. If it suggests installing httpuv, do it, as otherwise authorization might fail.
Step 1: Store (or not) Credential
Step 2: Sign in With Correct Identity
Step 3: Give permissions
![]()
![]()
Step 4: Get Authorization Code
You might or might not see this – you might get this code passed back automatically with a message that says
Authentication complete. Please close this page and return to R.
Step 5: Enter Auth Code
Run a Query from RStudio
The pattern is:
- Set your
project_id and my_sql_query objects
- Send that query to BigQuery using
results <- bq_project_query(project_id, my_sql_query)
- Get the table via
df <- bq_table_download(results)
Take a look!
Visit https://github.com/pm0kjp/rmedicine_2024_bigquery/blob/main/bq_demo.qmd and get the code, but we’ll also do this together bit by bit.
Vertex AI Workbench
![]()
::: note In the burger menu, select Vertex AI, then Workbench. Enable the api.
:::
Setting up R kernel
In a new Terminal in your Jupyter instance, enter the following:
conda create -n r
conda config --add channels conda-forge
conda install -c conda-forge r-base
conda install -c conda-forge r-essentials
conda install -c conda-forge r-tidyverse
conda install -c conda-forge r-stringr
conda install -c conda-forge r-gargle
conda install -c conda-forge r-bigrquery
conda activate r
Answer “y” when prompted.
Start Notebook with R kernel
File > New > Notebook
Select “R” as the kernel